263 research outputs found

    Enabling Privacy-Preserving GWAS in Heterogeneous Human Populations

    Get PDF
    The projected increase of genotyping in the clinic and the rise of large genomic databases has led to the possibility of using patient medical data to perform genomewide association studies (GWAS) on a larger scale and at a lower cost than ever before. Due to privacy concerns, however, access to this data is limited to a few trusted individuals, greatly reducing its impact on biomedical research. Privacy preserving methods have been suggested as a way of allowing more people access to this precious data while protecting patients. In particular, there has been growing interest in applying the concept of differential privacy to GWAS results. Unfortunately, previous approaches for performing differentially private GWAS are based on rather simple statistics that have some major limitations. In particular, they do not correct for population stratification, a major issue when dealing with the genetically diverse populations present in modern GWAS. To address this concern we introduce a novel computational framework for performing GWAS that tailors ideas from differential privacy to protect private phenotype information, while at the same time correcting for population stratification. This framework allows us to produce privacy preserving GWAS results based on two of the most commonly used GWAS statistics: EIGENSTRAT and linear mixed model (LMM) based statistics. We test our differentially private statistics, PrivSTRAT and PrivLMM, on both simulated and real GWAS datasets and find that they are able to protect privacy while returning meaningful GWAS results.Comment: To be presented at RECOMB 201

    One Size Doesn't Fit All: Measuring Individual Privacy in Aggregate Genomic Data

    Get PDF
    Even in the aggregate, genomic data can reveal sensitive information about individuals. We present a new model-based measure, PrivMAF, that provides provable privacy guarantees for aggregate data (namely minor allele frequencies) obtained from genomic studies. Unlike many previous measures that have been designed to measure the total privacy lost by all participants in a study, PrivMAF gives an individual privacy measure for each participant in the study, not just an average measure. These individual measures can then be combined to measure the worst case privacy loss in the study. Our measure also allows us to quantify the privacy gains achieved by perturbing the data, either by adding noise or binning. Our findings demonstrate that both perturbation approaches offer significant privacy gains. Moreover, we see that these privacy gains can be achieved while minimizing perturbation (and thus maximizing the utility) relative to stricter notions of privacy, such as differential privacy. We test PrivMAF using genotype data from the Welcome Trust Case Control Consortium, providing a more nuanced understanding of the privacy risks involved in an actual genome-wide association studies. Interestingly, our analysis demonstrates that the privacy implications of releasing MAFs from a study can differ greatly from individual to individual. An implementation of our method is available at http://privmaf.csail.mit.edu.Wellcome Trust (London, England) (Award 076113

    Abelian repetitions in partial words

    Get PDF
    AbstractWe study abelian repetitions in partial words, or sequences that may contain some unknown positions or holes. First, we look at the avoidance of abelian pth powers in infinite partial words, where p>2, extending recent results regarding the case where p=2. We investigate, for a given p, the smallest alphabet size needed to construct an infinite partial word with finitely or infinitely many holes that avoids abelian pth powers. We construct in particular an infinite binary partial word with infinitely many holes that avoids 6th powers. Then we show, in a number of cases, that the number of abelian p-free partial words of length n with h holes over a given alphabet grows exponentially as n increases. Finally, we prove that we cannot avoid abelian pth powers under arbitrary insertion of holes in an infinite word

    Enabling Privacy-Preserving GWASs in Heterogeneous Human Populations

    Get PDF
    The proliferation of large genomic databases offers the potential to perform increasingly larger-scale genome-wide association studies (GWASs). Due to privacy concerns, however, access to these data is limited, greatly reducing their usefulness for research. Here, we introduce a computational framework for performing GWASs that adapts principles of differential privacy-a cryptographic theory that facilitates secure analysis of sensitive data-to both protect private phenotype information (e.g., disease status) and correct for population stratification. This framework enables us to produce privacy-preserving GWAS results based on EIGENSTRAT and linear mixed model (LMM)-based statistics, both of which correct for population stratification. We test our differentially private statistics, PrivSTRAT and PrivLMM, on simulated and real GWAS datasets and find they are able to protect privacy while returning meaningful results. Our framework can be used to securely query private genomic datasets to discover which specific genomic alterations may be associated with a disease, thus increasing the availability of these valuable datasets.National Institutes of Health (U.S.) (Grant GM108348

    Systems-level discovery of quality attributes and candidate pathways for optimized production of human pluripotent stem cell-derived cardiomyocytes

    Get PDF
    Numerous protocols exist for differentiation of human pluripotent stem cells (hPSCs) to cardiomyocytes (CMs). Although these methods have improved in efficiency over the past decade, they remain highly variable in their resultant purities, not only among different source hPSC lines but also between batches in the same cell line. This substantial heterogeneity of hPSC-CM product outcomes points to poorly-understood, highly sensitive, and uncontrolled variables present within the overall process. Herein, we have undertaken a multi-omic discovery approach to identify key temporal differences in cell attributes between high- and low-purity hPSC-CM differentiations to provide systems-level insights into underlying mechanisms which drive these populations to divergent endpoints. Specifically, we are combining metabolomic, proteomic, lipidomic, and transcriptomic analyses collected throughout the differentiation process for high- and low-purity (as assessed by %cTnT+ via flow cytometry) differentiation batches. In addition to gaining fundamental insights into the underlying biology of the differentiation process, we are extending our analyses to 1) identify putative critical quality attributes for use in on- or at-line analytics for continuous process monitoring, 2) enhance process robustness through the development of protocols aimed at depressing off-target pathways and enhancing on-target ones, and 3) establish potential feedforward/feedback control schemes based on real-time analytics to respond to in-process intermediate quality attributes through rational adjustment of process parameters. To date we have identified novel putative candidate quality attributes for process monitoring and cellular pathways which may be able to be modulated to augment process robustness in a scaled manufacturing context. Beyond standard single-omic analytical workflows, ongoing work is aimed at integrating these data for deepened insight, including functional integration with systems-scale modeling and high-dimensional machine-learning methodologies to extract dynamic relationships among variables over time

    Pothole Reporting System

    Get PDF
    The purpose of this project is to create a pothole detection device that can be attached to the underside of a commercial vehicle. Potholes cost motorists around 6.4 billion dollars annually, thus demonstrating the need for a system to aid with the detection and reporting of potholes. The four systems we needed to consider for the implementation of this project were the power system, the sensing system, the data processing system, and the reporting and logging system. Power pulled from the vehicle will enable the sensors and data processing module. The data processing module will analyze the readings from the sensors and output pothole data to the logging and reporting system. The logging and reporting system, located on an android mobile device, will store the pothole locations on a cloud server

    Pothole Reporting System

    Get PDF
    The purpose of this project is to create a pothole detection device that can be attached to the underside of a commercial vehicle. Potholes cost motorists around 6.4 billion dollars annually, thus demonstrating the need for a system to aid with the detection and reporting of potholes. The four systems we needed to consider for the implementation of this project were the power system, the sensing system, the data processing system, and the reporting and logging system. Power pulled from the vehicle will enable the sensors and data processing module. The data processing module will analyze the readings from the sensors and output pothole data to the logging and reporting system. The logging and reporting system, located on an android mobile device, will store the pothole locations on a cloud server

    Inhibitors of SARS-CoV entry--identification using an internally-controlled dual envelope pseudovirion assay.

    Get PDF
    Severe acute respiratory syndrome-associated coronavirus (SARS-CoV) emerged as the causal agent of an endemic atypical pneumonia, infecting thousands of people worldwide. Although a number of promising potential vaccines and therapeutic agents for SARS-CoV have been described, no effective antiviral drug against SARS-CoV is currently available. The intricate, sequential nature of the viral entry process provides multiple valid targets for drug development. Here, we describe a rapid and safe cell-based high-throughput screening system, dual envelope pseudovirion (DEP) assay, for specifically screening inhibitors of viral entry. The assay system employs a novel dual envelope strategy, using lentiviral pseudovirions as targets whose entry is driven by the SARS-CoV Spike glycoprotein. A second, unrelated viral envelope is used as an internal control to reduce the number of false positives. As an example of the power of this assay a class of inhibitors is reported with the potential to inhibit SARS-CoV at two steps of the replication cycle, viral entry and particle assembly. This assay system can be easily adapted to screen entry inhibitors against other viruses with the careful selection of matching partner virus envelopes

    Boosting propagule transport models with individual-specific data from mobile apps

    Full text link
    Management of invasive species and pathogens requires information about the traffic of potential vectors. Such information is often taken from vector traffic models fitted to survey data. Here, user-specific data collected via mobile apps offer new opportunities to obtain more accurate estimates and to analyze how vectors' individual preferences affect propagule flows. However, data voluntarily reported via apps may lack some trip records, adding a significant layer of uncertainty. We show how the benefits of app-based data can be exploited despite this drawback. Based on data collected via an angler app, we built a stochastic model for angler traffic in the Canadian province Alberta. There, anglers facilitate the spread of whirling disease, a parasite-induced fish disease. The model is temporally and spatially explicit and accounts for individual preferences and repeating behaviour of anglers, helping to address the problem of missing trip records. We obtained estimates of angler traffic between all subbasins in Alberta. The model's accuracy exceeds that of direct empirical estimates even when fewer data were used to fit the model. The results indicate that anglers' local preferences and their tendency to revisit previous destinations reduce the number of long inter-waterbody trips potentially dispersing whirling disease. According to our model, anglers revisit their previous destination in 64% of their trips, making these trips irrelevant for the spread of whirling disease. Furthermore, 54% of fishing trips end in individual-specific spatially contained areas with mean radius of 54.7km. Finally, although the fraction of trips that anglers report was unknown, we were able to estimate the total yearly number of fishing trips in Alberta, matching an independent empirical estimate.Comment: Keywords: Angler; Gravity Model; Invasives (Applied Ecology); Modelling (Disease Ecology); Smartphone Apps; Survey Method; Vector; Whirling Diseas

    Avoiding abelian squares in partial words

    Get PDF
    AbstractErdős raised the question whether there exist infinite abelian square-free words over a given alphabet, that is, words in which no two adjacent subwords are permutations of each other. It can easily be checked that no such word exists over a three-letter alphabet. However, infinite abelian square-free words have been constructed over alphabets of sizes as small as four. In this paper, we investigate the problem of avoiding abelian squares in partial words, or sequences that may contain some holes. In particular, we give lower and upper bounds for the number of letters needed to construct infinite abelian square-free partial words with finitely or infinitely many holes. Several of our constructions are based on iterating morphisms. In the case of one hole, we prove that the minimal alphabet size is four, while in the case of more than one hole, we prove that it is five. We also investigate the number of partial words of length n with a fixed number of holes over a five-letter alphabet that avoid abelian squares and show that this number grows exponentially with n
    corecore